Exploratory Analysis: MFPCA for patches disturbed between 2015 and 2040 for all scenarios together


General questions before the start:


In order to consider the multivariate data structure at hand when running a FPCA, a multivariate FPCA (MFPCA) is conducted using the R-package MFPCA. Now, a single multiFunData object is created which consists of one funData object (package funData) which is three-dimensional: the first dimension represents the grid cells. We have 434, 442, 462 and 465 disturbed grid cells in scenarios Control, SSP1-RCP2.6, SSP3-RCP7.0 and SSP5-RCP8.5, respectively. This adds up to 1803 grid cells in total. The second dimension represents the Year after Disturbance and the third one the PFT, numbered from 1 to 5. Since we are looking at four independent runs of the vegetation model LPJ-GUESS, the grid cells disturbed in each scenario may vary or may coincide. In total, 1577 unique grid cells are disturbed, where 226 of them are disturbed in at least two scenarios.

The MFPCA is conducted with 10 principal components in order to be consistend with the univariate FPCAs conducted before for each scenario and PFT. Note that the MFPCA package does not provide an implementation of reconstructing the initial curves for multi-dimensional data.

Questions here: Is there a mathematical or computational reason, why this is not implemented? Would it be worth the effort to implement it myself?

In the following, we are going to see the principal components, now as heatmaps, a plot visualizing the first two principal components and the results of a 4-means clustering.


Principal Components

As a first step, let’s take a look at the principal components themselves. The following table shows the variability in per cent that the first 10 principal components account for in each scenario:

PC 1 PC 2 PC 3 PC 4 PC 5 PC 6 PC 7 PC 8 PC 9 PC 10
Variability 91.22 7.49 0.52 0.33 0.21 0.1 0.06 0.04 0.01 0.01

Note that this time, the values truly represent the accounted variation, not only the relative part. Obviously, the first principal component already captures nearly all of the variability in the data. The screeplot depicted in Figure 1 underlines this result:

Figure 1: Screeplot of the MFPCA.
Figure 1: Screeplot of the MFPCA.

In order to investigate changes in the values of the first 10 principal components, the following Figure 2 shows how positive (first plot) and negative deviations (second plot) to the mean act on the PC scores. Note that these plots only cover differences, not the mean itself. Values smaller than the mean are displayed in different shades of blue, while values above the mean are displayed in red.










Figure 2: First 10 principal components. Here, the y-axis represents PFTs (Tundra (1), Needleleaf Evergreen (2), Pioneering Broadlef (3), other Conifers (4) and Temperate Broadleaf (5)), and the x-axis the Year after Disturbance. The first row of each PC indicates positive deviations to the mean while the second row visualizes negative deviations to the mean. Note that these plots only cover differences, not the mean itself.
Figure 2: First 10 principal components. Here, the y-axis represents PFTs (Tundra (1), Needleleaf Evergreen (2), Pioneering Broadlef (3), other Conifers (4) and Temperate Broadleaf (5)), and the x-axis the Year after Disturbance. The first row of each PC indicates positive deviations to the mean while the second row visualizes negative deviations to the mean. Note that these plots only cover differences, not the mean itself.

The interpretation for the first PC might be the following:

So in total, high values in the first component indicate a lower share of Needleleaf Evergreen, a higher share of Pioneering Broadleaf and a higher share of Tundra in the beginning of the study period than the mean. Other Conifers and Temperate Broadleaf do not substantially contribute to the dynamics.

Clustering

In order to group the curves into clusters, the first 10 PC scores are clustered by a k-means algorithm. In order to evaluate an appropriate number of clusters, Figure 3 shows an elbow plot for \(k \in \{1,...,10\}\):

Figure 3: Elbow plot for k-means clustering for k \in \{1,...,10\}.
Figure 3: Elbow plot for k-means clustering for \(k \in \{1,...,10\}\).

The elbow plot would indicate an elbow at \(k=2\). However, given the data structure with four scenarios, in the following, \(k=4\) is chosen. The four clusters are rather unbalanced and dominated by two large clusters, namely clusters 3 and 4, as the following table indicates:

Number of curves in each cluster after 126 years of recovery
Control SSP1-RCP2.6 SSP3-RCP7.0 SSP5-RCP8.5 Sum
Cluster 1 76 46 56 67 245
Cluster 2 75 54 63 47 239
Cluster 3 231 166 142 126 665
Cluster 4 52 176 201 225 654
Sum 434 442 462 465 1803

In order to get a first impression of the cluster-specific PFT-behavior, Figure 4 shows the PFT-wise mean shares of above ground carbon over the first 100 years of recovery. Note that here, the clusters are derived using the whole time span of 126 years of recovery and all scenarios are combined.


Figure 4: PFT-wise mean shares of above ground carbon over time. Note that the values are averages over locations and clusters.
Figure 4: PFT-wise mean shares of above ground carbon over time. Note that the values are averages over locations and clusters.

The plot indicates that cluster 1 is mainly driven by Temperate Broadleaf, while Pioneering Broadleaf is the dominant species in clusters 2 and 4. These clusters differ in terms of other PFTs: in cluster 4, after about 60 years of recovery, hardly anything but Pioneering Broadleaf is present, whereas in cluster 2, other needleleafed trees are present as well. Cluster 3 is dominated by Needleleaf Evergreen, followed by Conifers (others). Note that Tundra does not play an important role here.

Cluster development over time

Theory in science of vegetation states that the vegetation composition after disturbance is already set only shortly after the disturbance. This would mean in statistical terms, that the cluster assignment should be rather similar no matter how many years of recovery after disturbance are considered. Figure 5 shows the cluster assignments for all four scenarios over different years after disturbance:


Figure 5: Cluster composition over years after disturbance for all four scenarios.
Figure 5: Cluster composition over years after disturbance for all four scenarios.

We can clearly see that the final composition of clusters is already (nearly) reached after about 60 years depending on the scenario. Drastic changes are especially in the first 20 years and the clusters quickly stabilize in their core grid cells.

Note that Lucia could not detect these patterns in her clustering, so she assumed that this mechanism is probably not covered by the simulation model LPJ-GUESS. This result here however could be an indication for that phenomenon in the simulation.

In order to further investigate these patterns, Figure 6 shows the cluster-wise means after different time spans of vegetation. Here, the clusters after 5, 10, 20, …, 120, 126 years are taken and evaluated by their mean share of above ground carbon over grid cells and clusters. This means in particular that in each considered time step, the cluster assignment could be changed (in contrast to Figure 4).


Figure 6: Cluster-wise mean shares of above ground carbon over different time spans. Note that the values are averages over locations and clusters.
Figure 6: Cluster-wise mean shares of above ground carbon over different time spans. Note that the values are averages over locations and clusters.

We can see that (depending on the PFT), the dominance of specific PFTs establishes after only a few decades.

Possible next steps: Further investigate the clusters, also considering climate data: are there any thresholds that define the clusters precisely (e.g. above a certain temperature, we are in cluster 4)?

PC 1 vs. PC 2

Figure 7 shows the first two principal components plotted against each other. The colors in the left plot visualize the clusters, while in the right plot, they indicate the scenario.

Figure 7: PC 1 vs. PC 2 for the MFPCA for grid cells which are disturbed in one scenario only. The left plot shows the resulting clusters using a 4-means clustering algorithm, the right plot shows the scenarios.
Figure 7: PC 1 vs. PC 2 for the MFPCA for grid cells which are disturbed in one scenario only. The left plot shows the resulting clusters using a 4-means clustering algorithm, the right plot shows the scenarios.

We can see that the clusters found are clearly distinguishable by the first two PCs and that the clusters derived by 4-means find patterns which are beyond the partitioning into scenarios.

Clustered curves per PFT and scenario

Next, to further investigate the cluster assignment, in the following, the clusters are displayed for each PFT and scenario separately. Let’s start with Tundra.

Tundra

Figure 8 shows the four clusters for scenario Control. Cluster 3 represents the largest cluster with nearly half of the grid cells inside. Cluster 1 covers those curves with a less steep decrease in above ground carbon. Cluster 2 and 4 are rather similar.


Figure 8: Clustered curves for scenario Control and PFT Tundra. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 8: Clustered curves for scenario Control and PFT Tundra. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

Next, Figure 9 shows the equivalent plot for scenario SSP1-RCP2.6. Here, the two dominating clusters 3 and 4 are rather similar, while clusters 1 and 2 cover all grid cells with a (second) peak after 60 years of regeneration.


Figure 9: Clustered curves for scenario SSP1-RCP2.6 and PFT Tundra. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 9: Clustered curves for scenario SSP1-RCP2.6 and PFT Tundra. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

Figure 10 (scenario SSP3-RCP7.0) shows a similar pattern: there are no substantial differences between the clusters.


Figure 10: Clustered curves for scenario SSP3-RCP7.0 and PFT Tundra. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 10: Clustered curves for scenario SSP3-RCP7.0 and PFT Tundra. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

Finally, Figure 11 shows the clustered curves for the most drastic scenario SSP5-RCP8.5. Again, no clear differences are visible by visual inspection.


Figure 11: Clustered curves for scenario SSP5-RCP8.5 and PFT Tundra. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 11: Clustered curves for scenario SSP5-RCP8.5 and PFT Tundra. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

In total, PFT Tundra seems not to be dominant for cluster building.


Needleleaf Evergreen

Second, let’s take a look at PFT Needleleaf Evergreen. Figure 12 shows the clustered curves for scenario Control:


Figure 12: Clustered curves for scenario Control and PFT Needleleaf Evergreen. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 12: Clustered curves for scenario Control and PFT Needleleaf Evergreen. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

Here, some obvious patterns are detected by the clustering: while clusters 1 and 2 cover mostly mediate shares of above ground carbon, cluster 3, the dominant cluster, represents grid cells with a rather high share. Cluster 4 on the other hand represents those curves with a small peak in the beginning and a rather low share in general.

Figure 13 shows the equivalent plot for scenario SSP1-RCP2.6:


Figure 13: Clustered curves for scenario SSP1-RCP2.6 and PFT Needleleaf Evergreen. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 13: Clustered curves for scenario SSP1-RCP2.6 and PFT Needleleaf Evergreen. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

Here, the same patterns as for the Control scenario become apparent: the two dominant clusters 3 and 4 represent very high and very low shares of above ground carbon, respectively, while cluster 1 and 2 cover mediate shares. Note that in contrast to before, now, we have two dominant clusters.


Figure 14: Clustered curves for scenario SSP3-RCP7.0 and PFT Needleleaf Evergreen. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 14: Clustered curves for scenario SSP3-RCP7.0 and PFT Needleleaf Evergreen. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

Figure 14 and Figure 15 visualize the clustering for scenarios SSP3-RCP7.0 and SSP5-RCP8.5, respectively. Again, the same dynamics as for the first warming scenario are present in the data.


Figure 15: Clustered curves for scenario SSP5-RCP8.5 and PFT Needleleaf Evergreen. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 15: Clustered curves for scenario SSP5-RCP8.5 and PFT Needleleaf Evergreen. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

In total, clusters 1 and 2 are mainly driven by mediate shares of above ground carbon of Needleleaf Evergreen, while clusters 3 and 4 mainly reflect rather high and low shares, respectively. Therefore, PFT Needleleaf Evergreen seems to be an important component in the clustering process.


Pioneering Broadleaf

Figure 16 shows the clustered curves for scenario Control and PFT Pioneering Broadleaf.


Figure 16: Clustered curves for scenario Control and PFT Pioneering Broadleaf. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 16: Clustered curves for scenario Control and PFT Pioneering Broadleaf. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

Here, clusters 1 and 3 represent rather low shares of above ground carbon, while clusters 2 and 4 cover higher shares. Note that clusters 2 and 4 differ in the increase dynamic, i.e. the increase in share in cluster 4 is more steep and with a higher peak on average than in cluster 2. Cluster 3 is the dominant cluster.

Figure 17 shows the equivalent plot for scenario SSP1-RCP2.6:


Figure 17: Clustered curves for scenario SSP1-RCP2.6 and PFT Pioneering Broadleaf. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 17: Clustered curves for scenario SSP1-RCP2.6 and PFT Pioneering Broadleaf. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

The two dominant clusters 3 and 4 represent mostly very low and very high shares in above ground carbon, respectively. The two remaining clusters, clusters 1 and 2, cover moderate low and moderate high shares, respectively.


Figure 18: Clustered curves for scenario SSP3-RCP7.0 and PFT Pioneering Broadleaf. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 18: Clustered curves for scenario SSP3-RCP7.0 and PFT Pioneering Broadleaf. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

Figure 18 and 19 show the dynamics for scenarios SSP3-RCP7.0 and SSP5-RCP8.5, respectively. Again, similar patterns are obvious, but with an interior dynamic: the more the temperature rises, the less clear the clustering. For example, cluster 3, which represents mainly rather low shares, tend to have more and more curves with higher shares. The pattern in cluster 2, i.e. the behavior in increasing changes and becomes less uniform.


Figure 19: Clustered curves for scenario SSP5-RCP8.5 and PFT Pioneering Broadleaf. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 19: Clustered curves for scenario SSP5-RCP8.5 and PFT Pioneering Broadleaf. The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

In total, Pioneering Broadleaf seems to be a major driver of cluster building.


Conifers (others)

Figure 20 shows the clustering dynamics for scenario Control and PFT Conifers (others). Some minor differences are present: while the dominating cluster 3 and cluster 2 mainly cover grid cells with a high share of above ground carbon, clusters 1 and 4 represent those cells with a maily lower share. Nite that the dynamics in these clusters differ in terms of the increase and timing of the peak.


Figure 20: Clustered curves for scenario Control and PFT Conifers (others). The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 20: Clustered curves for scenario Control and PFT Conifers (others). The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

Figure 21 shows the equivalent plot for scenario SSP1-RCP2.6. Now, the two dominant clusters 3 and 4 cover mainly grid cells with a high and low share of above ground carbon, respectively, while clusters 1 and 2 cover the mediate shares.


Figure 21: Clustered curves for scenario SSP1-RCP2.6 and PFT Conifers (others). The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 21: Clustered curves for scenario SSP1-RCP2.6 and PFT Conifers (others). The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

Figure 22 and Figure 23 show the same dynamics for scenarios SSP3-RCP7.0 and SSP5-RCP8.5, respectively. The dominating clusters 3 and 4 represent high and low shares of above ground carbon, while the two remaining clusters cover mediate shares.


Figure 22: Clustered curves for scenario SSP3-RCP7.0 and PFT Conifers (others). The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 22: Clustered curves for scenario SSP3-RCP7.0 and PFT Conifers (others). The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

Figure 23: Clustered curves for scenario SSP5-RCP8.5 and PFT Conifers (others). The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.
Figure 23: Clustered curves for scenario SSP5-RCP8.5 and PFT Conifers (others). The colored curves indicate the belonging to the respective cluster. The dark curves represent the cluster-specific mean functions.

In summary, PFT Conifers (others) seems to mainly affect clusters 3 and 4, but is a minor factor in clustering in general.


Temperate Broadleaf

Lastly, let’s take a look at PFT Temperate Broadleaf. Figure 24 shows the clustered curves for scenario Control:


Figure 24: Temperate Broadleaf - Control.
Figure 24: Temperate Broadleaf - Control.

In the Control scenario, only a few grid cells have a non-zero share of above ground carbon. Due to this lack of data, the clustering should not be overrated.


Figure 25: Temperate Broadleaf - SSP1-RCP2.6
Figure 25: Temperate Broadleaf - SSP1-RCP2.6

Figure 25 shows the equivalent plot for scenario SSP1-RCP2.6. Clearly, some dependencies are apparent: clusters 1 and 2 comprise of almost all of the non-zero curves, but reflect both a different dynamic in the increase.


Figure 26: Temperate Broadleaf - SSP3-RCP7.0
Figure 26: Temperate Broadleaf - SSP3-RCP7.0

With increase in temperature, Temperate Broadleaf becomes more and more important. This is also visible in Figure 24, where the clustered curves for scenario SSP3-RCP7.0 are displayed. In all four clusters there are a couple of non-zero curves, with most of them being in cluster 1, the smallest cluster.


Figure 27: Temperate Broadleaf - SSP5-RCP8.5
Figure 27: Temperate Broadleaf - SSP5-RCP8.5

A similar pattern is apparent in the data for scenario SSP5-RCP8.5 depicted in Figure 27. Cluster 1 is the only cluster with representing some dynamic in the curves.

In total, PFT Temperate Broadleaf hardly affects the clustering algorithm.


Summary of the Results

To conclude and bring together all the results, here, the effects of the PFTs on the clusters are summarized for each scenario.

Control:

SSP1-RCP2.6:

SSP3-RCP7.0:

SSP5-RCP8.5:


In total, the clusters are especially influenced by PFTs Needleleaf Evergreen, Pioneering Broadleaf and other Conifers, while PFTs Temperate Broadleaf and Tundra seem to be less crucial for cluster forming.